LCM ver.3: Collaboration of Array, Bitmap and Pre x Tree for Frequent Itemset Mining
نویسندگان
چکیده
ABSTRACT For a transaction database, a frequent itemset is an itemset included in at least a specified number of transactions. To find all the frequent itemsets, the heaviest task is the computation of frequency of each candidate itemset. In the previous studies, there are roughly three data structures and algorithms for the computation: bitmap, prefix tree, and array lists. Each of these has its own advantage and disadvantage with respect to the density of the input database. In this paper, we propose an efficient way to combine these three data structures so that in any case the combination gives the best performance.
منابع مشابه
LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets
For a transaction database, a frequent itemset is an itemset included in at least a specified number of transactions. A frequent itemset P is maximal if P is included in no other frequent itemset, and closed if P is included in no other itemset included in the exactly same transactions as P . The problems of finding these frequent itemsets are fundamental in data mining, and from the applicatio...
متن کاملIndex-Maxminer: a New Maximal Frequent Itemset Mining Algorithm
Because of the inherent computational complexity, mining the complete frequent itemset in dense datasets remains to be a challenging task. Mining Maximal Frequent Itemset (MFI) is an alternative to address the problem. Set-Enumeration Tree (SET) is a common data structure used in several MFI mining algorithms. For this kind of algorithm, the process of mining MFI’s can also be viewed as the pro...
متن کاملIndex-CloseMiner: An improved algorithm for mining frequent closed itemset
The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. This paper proposes an improved algorithm for mining frequent closed itemsets. Firstly, the index array is proposed, which is used for discovering those items that always appear together. Then, by using bitmap, an algorithm for computing index array is pr...
متن کاملAn Efficiently Algorithm for Mining Association Rules
—Association rules mining is one of the most important topic in data mining. A new algorithm for mining association rules is proposed in this paper. In data mining, the process of counting any itemset`s support requires a great I/O and computing cost. An impacted bitmap technique to speed up the counting process is employed in this paper. Nevertheless, saving the intact bitmap usually has a big...
متن کاملMining Frequent Sequences Using Itemset-Based Extension
In this paper, we systematically explore an itemset-based extension approach for generating candidate sequence which contributes to a better and more straightforward search space traversal performance than traditional item-based extension approach. Based on this candidate generation approach, we present FINDER, a novel algorithm for discovering the set of all frequent sequences. FINDER is compo...
متن کامل